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Use of self-administered instruments to assess 
psychiatric disorders in older people: validity of 
the General Health Questionnaire, the Center 
for Epidemiologic Studies Depression Scale and 
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Interview Schedule 
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Background. Diagnosis of depressive disorder using interviewer-administered instruments is expensive and frequently 
impractical in large epidemiological surveys. The aim of this study was to assess the validity of three self-completion 
measures of depressive disorder and other psychiatric disorders in older people against an interviewer-administered 
instrument. 

Method. A random sample stratified by sex, age and social position was selected from the Whitehall II study 
participants. This sample was supplemented by inclusion of depressed Whitehall II participants. Depressive disorder 
and other mental disorders were assessed by the interviewer-administered structured revised Clinical Interview 
Schedule (CIS-R) in 277 participants aged 58-80 years. Participants also completed a computerized self-completion 
version of the CIS-R in addition to the General Health Questionnaire (GHQ) and the Center for Epidemiologic 
Studies Depression Scale (CES-D). 

Results. The mean total score was similar for the interviewer-administered (4.43) and self-completion (4.35) versions 
of the CIS-R [95% confidence interval (CI) for difference —0.31 to 0.16]. Differences were not related to sex, age, social 
position or presence of chronic physical illness. Sensitivity/specificity of self-completion CIS-R was 74%/98% for any 
mental disorder and 75%/98% for depressive episode. The corresponding figures were 86%/87% and 78%/83% for 
GHQ and 77%/89% and 89%/86% for CES-D. 

Conclusions. The self-completion computerized version of the CIS-R is feasible and has good validity as a measure of 
any mental disorder and depression in people aged > 60 years. GHQ and CES-D also have good criterion validity as 
measures of any mental disorder and depressive disorder respectively. 
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Introduction 

Structured diagnostic interviews, such as the 
Composite International Diagnostic Interview (CIDI; 
Wittchen, 1994) and the revised Clinical Interview 
Schedule (CIS-R; Lewis et al. 1992), are considered by 
many researchers to be the most valid and reliable 
methods for the assessment of mental disorders in 
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populations according to diagnostic criteria (ICD-10 
or DSM-IV). The CIS-R has been widely used in the 
UK (Brugha et al. 1999) whereas the CIDI has been 
more commonly used in the USA (Haro et al. 2006). 
In comparisons with semi-structured clinical evalu- 
ations, the CIS-R has been shown to be a valid measure 
of mental disorders (Patton et al. 1999; Jordanova et al. 
2004; Brugha et al. 2005; Pez et al. 2010). 

However, structured interviews such as the CIDI 
and the CIS-R may be expensive and impractical to 
use in large, epidemiological studies. Large-scale sur- 
veys have therefore often relied on self-administered 
instruments to identify psychiatric illness and morbidity, 
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despite concerns about the validity and reliability 
of these measures. Although some studies have 
demonstrated that self-administered instruments are 
valid in younger and middle-aged adults (Goldberg 
& Williams, 1988; Stansfeld & Marmot, 1992) and 
have compared self-completion and interviewer ver- 
sions of either the CIS-R or the CIDI (Lewis et al. 
1988; Lewis, 1994; Peters et al. 1998), few studies 
have investigated their validity in older populations. 

In this study, we tested whether a computerized self- 
completion version of the CIS-R (Lewis et al. 1988) was 
a feasible and valid instrument for identifying mental 
disorders in older adults by comparing results with 
the interview-administered CIS-R, considered to be 
the reference standard in this study. In addition, we 
examined the sensitivity and specificity of two com- 
monly used self -completion questionnaires, the Center 
for Epidemiologic Studies Depression Scale (CES-D; 
Radloff, 1977) and the General Health Questionnaire 
(GHQ; Goldberg, 1972), as measures of psychiatric dis- 
orders in a UK population aged 58-80 years. 

Method 

The Whitehall II study 

The Whitehall II study is a cohort of 10308 originally 
London-based civil servants (6895 men and 3413 
women) aged between 35 and 55 years in 20 
London-based civil service departments, established 
between 1985 and 1988 (phase 1) (Marmot et al. 
1991). The validation study reported in this paper 
was conducted at phase 10 of the Whitehall II study 
in 2011. The main aims of phase 10 were to (1) validate 
self-completion measures of psychiatric morbidity, in 
addition to several other screening measures, in older 
people and (2) to invite a subsample of participants 
to take part in a neuroimaging study of late-onset 
depression cases and never-depressed controls. 

Study sample 

The sample (phase 10) was selected from the Whitehall 
II cohort. We drew a random sample of 255 persons, 
stratified by sex, age and social position (most recent 
employment grade) from among the 5390 cohort mem- 
bers who attended the phase 9 follow-up examination 
in 2008-2009. To obtain a sufficient number of de- 
pressed adults, we supplemented this sample by 
inclusion of all participants with evidence of late-onset 
depressive symptoms in the 2008-2009 follow-up. 

Of the 5390 who attended the phase 9 screening 
examination, 88 participants were classified as having 
late-onset depressive symptoms; as six of these 88 
were already selected in the random sample of 255, 
this gave a total supplemented sample of 337. Three 



of the 337 were living overseas and therefore were 
not invited to participate; a further four people died 
before being contacted. Thus, 330 people were eligible 
and were invited to participate at phase 10; of these, 
277 took part (response rate 84%). 

Study procedures and measures 

Self-completion questionnaires, including the CES-D 
and the GHQ, were sent out in December 2010 along 
with invitation letters to attend a screening clinic. 
Participants were asked to bring along their completed 
questionnaires to hand in at the clinic. According to 
the recorded date of questionnaire completion, the 
majority of participants completed their postal ques- 
tionnaires shortly before their screening clinic appoint- 
ment (median 2 days apart, 87% less than 30 days 
apart). Between 31 January 2011 and 14 March 2011, 
participants attended screening where they completed 
both the interviewer-administered and the computer- 
ized self-completion versions of the CIS-R. We allo- 
cated participants randomly to complete either the 
interviewer version first or the computerized self- 
completion version first. A potential limitation of our 
study is that, to reduce respondent burden, both ver- 
sions of the CIS-R were administered on the same 
day. However, participants were administered the 
other phase-10 measures in between the first and 
second CIS-R versions to reduce the risk that the 
respondent recalled their answer to the first version. 
Participants were offered tea and biscuits at the end 
of the phase-10 screening but no financial incentives 
were offered for participation. Ethical approval for 
the Whitehall II study was obtained from the Univer- 
sity College London Medical School committee on 
the ethics of human research, and all participants 
gave informed written consent. 

The CIS-R is a structured diagnostic interview for 
common mental disorders, formerly neurotic disorders 
(Lewis et al. 1992), but because of the structured nature 
of the questions and responses in this measure, a 
computerized self-completion version is also available 
(Lewis et al. 1988). Both versions generate scores on 14 
psychiatric symptoms (listed in Table 1), a total score 
and diagnoses of depressive and other common mental 
disorders based on the ICD-10 (diagnoses listed 
in Table 2), thus providing measures of severity and 
also presence or absence of mental disorders. 

A CIS-R total score 5=12 was used to define cases 
with any common mental disorder (Lewis et al. 1992). 
The wording of the questions and responses was 
the same in the computerized self-completion and 
interviewer-administered versions but interviewers 
used show cards listing response options for questions 
that were sensitive or had several possible responses. 
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Table 1. Prevalence of mental disorders 
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CIS-R, Clinical Interview Schedule Revised; CI, confidence interval; CES-D, Center for Epidemiologic Studies Depression Scale; 
GHQ, General Health Questionnaire. 

a There were two cases of panic disorder (ICD-10 F41.0) and no cases of obsessive-compulsive disorder (ICD-10 F42). 



Both versions were administered using the program 
PROQSY (Lewis et al. 1988). A laptop computer was 
used for the interviewer version with interviewers 
reading questions from the screen and entering 
responses directly. A desktop computer was used for 
the self-completion version. Start and end times were 
recorded so that completion times could be compared. 
Interviewers were given 2 hours of training in the use 
of the CIS-R, which included a practice session. All 
interviewers were given a written protocol to follow 
and had the opportunity for further practice interviews 
during the pilot of the phase-10 data collection. Less 
than 1% of participants were given help with the com- 
puterized version because of eyesight problems or 
other problems with using a computer. 

The 20-item CES-D is a short self-report question- 
naire designed to measure depressive symptoms in 
the general population (Radloff, 1977). Participants 
were asked to score the frequency of occurrence 
of specific symptoms during the previous week on a 
four-point scale, where 0=Tess than 1 day', l='l-2 
days', 2 ='3-A days' and 3 ='5-7 days'. These were 
summed to yield a total score between 0 and 60. 



Participants scoring 5=16 were categorized as cases of 
CES-D depression (Stansfeld et al. 2008). The CES-D 
was included at phases 7, 9 and 10. 

The 30-item GHQ is a well-established screening 
questionnaire for common mental disorder, suitable 
for use in general population samples (Goldberg, 
1972). The GHQ was included in all study phases 
1-10 with the exception of phase 4. At phase 1 of the 
study, this was validated against the CIS in a sub- 
sample and, on the basis of receiver operating charac- 
teristic (ROC) analysis, those scoring >5 were 
deemed GHQ cases (Stansfeld & Marmot, 1992). A 
four-item depression subscale (Cronbach's a =0.88) 
was identified from the 30-item GHQ on the basis of 
factor analysis and comparison with the items of the 
depression subscale of the 28-item GHQ (Goldberg & 
Hillier, 1979). A total depression score (ranging from 
0 to 12) was derived by summing responses to these 
four items using Likert scoring (0 to 3) for each item. 
Participants scoring ^2 were categorized as cases of 
GHQ depression (Stansfeld et al 1998). 

A measure of early-onset depressive symptoms was 
derived from GHQ measures at phases 1-9 and 
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Table 2. Agreement between self-completion and interviewer versions for the total CIS-R score and symptom scores (n=274) 
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CIS-R, Clinical Interview Schedule Revised; CI, confidence interval. 
a Total score ranges from 0 to 57. 

b Symptom scores range from 0 to 4 (depressive ideas symptom score 0 to 5). 



defined as two or more reports of GHQ caseness and/ 
or two or more reports of GHQ depression subscale 
caseness before age 60. Late-onset depression was 
defined as being a CES-D case at phase 9 AND having 
no early-onset depressive symptoms AND no preva- 
lent serious chronic conditions (coronary heart disease, 
cancer, stroke). 

Statistical analysis 

For each mental disorder we computed estimates of 
raw prevalence, weighted prevalence to adjust for 
oversampling of depressed cases and prevalence in 
the randomly selected cohort subsample. Differences 
in prevalence estimates between the self-completion 
measures and interviewer CIS-R were tested using 
McNemar's / 2 test. Differences in mean total scores 
and specific symptom scores between the self- 
completed and interviewer CIS-R were examined 
with the paired t test. The agreement of scores between 
the two versions was assessed with the weighted k 
statistic. Linear regression with difference in CIS-R 
score between the two versions was used to test for 
evidence that differences in method of administration 
were related to age, sex, employment grade or pres- 
ence of chronic physical illness. We performed ROC 
analysis to compute estimates of sensitivity, specificity, 
positive predictive value (+PV), negative predictive 
value (-PV) and area under the ROC curve (AUC) 



for all self-completion measures of any mental disorder 
and specific mental disorders using the interviewer- 
administered CIS-R as the criterion. Based on pub- 
lished guidelines, we considered AUC values ^0.90 
to indicate excellent validity and values ^0.80 but 
<0.90 to indicate good validity (Metz, 1978). We 
checked the cut-off points of scores ^16 and scores 
^ 5 used to define CES-D and GHQ cases respectively 
by ROC analyses. Analyses were performed using 
Stata version 12 (StataCorp, USA). 

Results 

Of the 330 persons invited, 277 attended the examin- 
ation (response rate 84%) and 274 had complete data 
on both interviewer-administered and self-completion 
versions of the CIS-R. The mean age was 69.1 (s.d.= 
5.8) years for participants allocated to the self- 
completion CIS-R version first and 68.3 (s.d. = 6.2) for 
participants allocated to the self-completion CIS-R ver- 
sion second. Among participants allocated to the self- 
completion version first (second), 31% (28%) were 
female; the most recent employment grade was high 
for 42% (47%), middle for 45% (41%) and low for 
13% (13%); the proportion classified as GHQ cases 
was 21% (21%) and the proportion classified as 
CES-D cases was 17% (16%). Similarly, CIS-R mean 
total scores did not differ significantly according to 
order of administration of the two CIS-R versions. 
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Table 1 presents the prevalence for each of the 
mental health measures. Based on the interviewer- 
administered CIS-R, 27 participants were diagnosed 
as having any mental disorder. The numbers of par- 
ticipants diagnosed as having specific disorders were: 
12 depressive episode; nine mixed anxiety and 
depressive disorder; 16 generalized anxiety disorder; 
five phobia; and two panic disorder. No participants 
were diagnosed with obsessive-compulsive disorder. 



Validity of self-completion CIS-R 

Table 2 shows the mean value for the total CIS-R score 
and each of the 14 symptom scores. The mean differ- 
ence in the total score between self-completion CIS-R 
and interviewer CIS-R was small, the mean scores on 
the two versions were 4.35 and 4.43 respectively 
[95% confidence interval (CI) for difference in means 
-0.31 to 0.16, p=0.26, paired t test]. For 12 of the 14 
symptom scores, differences in symptom scores did 
not differ according to method of administration. 
Differences for both fatigue and compulsions were 
statistically significant, with slightly lower scores on 
the self-completion version than the interviewer ver- 
sion. In a linear regression model, the difference in 
total CIS-R score between the two versions was not 
related to age, sex, social position or presence of 
chronic physical illness. 

Table 3 presents sensitivity and specificity figures 
for the self-completion CIS-R measures of any mental 
disorder and specific mental disorders. The sensitivity 
for any mental disorder was 74.1% and specificity 
98.4%. The corresponding figures for depressive epi- 
sode were 75.0% and 97.7% respectively. The self- 
completion CIS-R was also a sensitive and specific 
measure of all phobias (80%/98.1%), but its sen- 
sitivity was low for mixed anxiety and depressive dis- 
order and for generalized anxiety disorder. The 
specificity (>97%) was very high for all diagnostic 
categories. 



Validity of the CES-D and GHQ 

Table 4 shows that the CES-D is a sensitive and specific 
measure of any mental disorder (sensitivity/specificity 
77%/89%) and depressive episode (sensitivity/specifi- 
city 89%/86%). This is also the case for the GHQ case- 
ness (86%/87% for any mental disorder; 78%/83% for 
depressive episode). By contrast, the GHQ depression 
measure constructed from four items of the 30-item 
GHQ was not a sensitive measure for depressive 
episode, although the ROC analysis indicated that 
sensitivity for depressive episode was somewhat 
improved for a cut-point ^2 (sensitivity/specificity 
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Table 4. Sensitivity and specificity for the self-completion CES-D and the GHQ as measures of any mental disorder and depressive episode 
with the interviewer-administered CIS-R as the criterion (n=256) 



CIS-R interviewer version 





Sensitivity (%) 


Specificity (%) 


+PV 


-PV 


+LR 


-LR 


AUC (95% CI) 


Any mental disorder 
















CES-D case 


77.3 (0.54-0.91) 


88.9 (0.84-0.92) 


0.39 


0.98 


6.95 


0.26 


0.83 (0.74-0.92) 


GHQ case 


86.4 (0.64-0.96) 


87.2 (0.82-0.91) 


0.39 


0.99 


6.74 


0.16 


0.87 (0.79-0.94) 


GHQ depression case 


36.4 (0.18-0.59) 


94.4 (0.80-0.97) 


0.38 


0.94 


6.55 


0.67 


0.65 (0.55-0.76) 


Depressive episode 
















CES-D case 


88.9 (0.51-0.99) 


85.8 (0.81-0.90) 


0.19 


0.99 


6.27 


0.13 


0.87 (0.76-0.98) 


GHQ case 


77.8 (0.40-0.96) 


83.0 (0.78-0.87) 


0.14 


0.99 


4.57 


0.27 


0.80 (0.66-0.95) 


GHQ depression case 


44.4 (0.15-0.77) 


93.1 (0.89-0.96) 


0.19 


0.98 


6.46 


0.60 


0.69 (0.51-0.86) 



CES-D, Center for Epidemiologic Studies Depression Scale; GHQ, General Health Questionnaire; CIS-R, Clinical Interview 
Schedule Revised; +PV, positive predictive value; — PV, negative predictive value; +LR, positive likelihood ratio; — LR, negative 
likelihood ratio; AUC, area under the receiver operating characteristic (ROC) curve; CL confidence interval. 



56%/90%) in place of the cut-point > 3 used in earlier 
studies (sensitivity/specificity 44%/93%). 

Discussion 

Data from men and women aged 58-80 years show 
reasonably high sensitivity and specificity, varying 
between 74% and 98%, for the CES-D, the 30-item 
GHQ and the computerized self-completion version 
of the CIS-R as measures of any mental disorder and 
depressive episode. The computerized self-completion 
CIS-R was additionally a sensitive and specific meas- 
ure of phobias and accurately detected symptom sever- 
ity in 12 specific psychiatric symptoms. These findings 
suggest that several self-administered instruments, 
with reasonable criterion validity, may be used to 
screen for common mental disorders and depression 
in populations aged ^60 years. Furthermore, the 
mean total score from the computerized self- 
completion version and the structured interview ver- 
sion were very similar. 

An earlier comparison of the computerized self- 
completion version of the CIS-R against the structured 
psychiatric interview in this population when they 
were aged 35-55 years showed slightly higher sensi- 
tivity (82%) and lower specificity (84%) (Lewis et al. 
1988). Previous studies on this measure have shown 
good agreement in severity score and case definition 
for any psychiatric disorder in primary care and occu- 
pational settings but these studies did not examine 
agreement for specific ICD-10 disorders such as de- 
pressive episode (Lewis et al. 1988; Lewis, 1994). We 
found that symptom scores were significantly lower 
on the self-completion version than the interviewer 
version for both fatigue and compulsions. This is in 



contrast to an earlier study where the only significant 
difference in the 14 symptom scores was for sleep 
symptoms (Lewis, 1994). It is possible that these 
findings are due to chance. 

According to a review of 28 studies, previous inves- 
tigations on the CES-D and GHQ have reported 
validity estimates comparable to those we observed 
(Williams et al. 2002). Our current findings are also in 
agreement with those obtained over 20 years ago for 
this cohort. At the baseline of the Whitehall II study 
when the participants were aged 35-55 years, the sen- 
sitivity of the GHQ against the CIS was 73% although 
specificity was slightly worse at 78% (Stansfeld & 
Marmot, 1992). In a vulnerable, very old population 
living in residential homes in The Netherlands, sen- 
sitivity for CES-D for depressive and/or anxiety 
disorders exceeded 80% but specificity was lower, 
at 61% (Dozeman et al. 2011). Among postpartum 
women, a 60% sensitivity and 90% specificity was 
observed for the CES-D (Boyd et al. 2005). However, 
the validity of the CES-D has been lower in some 
(Klinkman et al 1997; Thomas et al. 2001) but not all 
clinical samples (Stahl et al. 2008). 

Limitations and strengths of the study 

A limitation of this study is that participants were 
recruited from an occupational cohort so our findings 
may not apply to people who have not had paid 
employment. Our sample was relatively healthy and 
consisted of people able to travel to our London clinic. 
Estimates of sensitivity were imprecise for specific 
anxiety disorders because of the small number of 
people diagnosed with these disorders in this sample. 
We considered the interviewer-administered version 
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to be the 'gold standard' criterion although this is 
somewhat arbitrary as it is possible that people may 
be more likely to under-report symptoms in an inter- 
viewer-administered version than in a self-completion 
version. Given this limitation, our study could alterna- 
tively be described as a reliability, method-comparison 
or concordance study. Furthermore, the GHQ and the 
CES-D self-completion questionnaire were posted to 
participants so that differences between the GHQ/ 
CES-D and the CIS-R may be attributable not only to 
the instrument but also to the mode of administration 
and setting, such as completion at home rather than 
in a clinic. A further limitation is that, although the 
majority of participants completed their postal ques- 
tionnaires shortly before their screening clinic appoint- 
ment (median 2 days apart, 87% within 1 month), 
the gap of more than a month for some participants 
may mean that the results were influenced by changes 
in symptoms. It is possible that this partially accounted 
for our results showing that sensitivity was poor for 
both mixed anxiety and depressive disorder and for 
generalized anxiety disorder. 

The strengths of this study are that our sample was 
selected randomly from a large cohort study, and was 
large enough to demonstrate that the similar severity 
scores obtained from the two methods of adminis- 
tration were consistent for men and women, across 
age groups, for different employment grades and for 
people with and without a chronic physical illness. 
Additionally, we demonstrated that it is feasible to 
use a computerized self-completion version in studies 
of older participants as response rates were identical 
for the two versions. 

An advantage of self -completion instruments is that 
they are less expensive to administer than interviewer 
instruments. At the time of writing this paper, 
more than 1500 participants had been screened in the 
sixth medical examination of the Whitehall II study. 
Respondents attended the clinic where physiological 
measures, blood tests, cognitive function and the self- 
completion version of the CIS-R were administered. 
A member of the clinic staff introduced the respondent 
to the self-completion computerized CIS-R version. 
This took no more than a few minutes. Several com- 
puters were available in a quiet room so that up to 
six respondents could complete the CIS-R at any one 
time. We estimate that using the self-administered 
CIS-R procedure reduced staff costs by at least 60% 
compared to using the interviewer version, where it 
would be necessary to schedule appointments about 
30 to 45 minutes apart. Based on preliminary data 
from the first 1500 participants at phase 11, 0.5% 
were given reading glasses and 0.5% were helped by 
clinic staff because of poor eyesight or physical diffi- 
culty using a computer. 



Implications 

Taken together, these findings suggest that the com- 
puterized self-completion CIS-R provides a feasible 
and less expensive alternative to the interviewer- 
administered CIS-R to identify any common mental 
disorder and depressive episode according to ICD-10. 
The CES-D and 30-item GHQ also have reasonable 
criterion validity as measures of common mental 
disorders and depression. 
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